World leaders are influential people that have the power to impact millions of people’s lives. Some leaders are elected for terms while others serve for life, so their expected survival is of great interest. In this paper, we seek to compare Popes, US Presidents, Dalai Lamas, Chinese Emperors, and Japanese Emperors to see how their lifespans compare. Additionally, we want to analyze the impact on survival of a leader’s birth year, and whether or not this impact might vary depending on the type of leadership. After we build a model to analyze the effects, we also wish to look at some specific cases and determine the probability that the current 14th Dalai Lama will outlive Pope Francis and the probability that President Obama will outlive Emperor Naruhito. We will accomplish this by using survival analysis. We use Bayesian Inference to estimate the model parameters with the help of the JAGS program.
The rest of the paper is structured as follows. First, we will describe the data used to carry out our analysis. Next, we will describe the methods used in our analysis plan. Then, we will show how we carried out our analysis plan and recount our results. After that, we discuss our conclusions and areas for further research. Finally, our code and some additional information can be foudn in the Appendix.
The data used in this analysis contains entries for 177 different world leaders. The types of world leaders present in the data include Popes, US Presidents, Dalai Lamas, Chinese Emperors, and Japanese Emperors. Each leader’s birth date type of leader is recorded. For some of these groups, we have data dating all the way back to the 14th century. Additionally, leaders who have passed away also have their death date and age of death recorded. For leaders that are still living, these columns instead contain the date the dataset was created (July 31, 2020) and their current age on that date. In order to further clarify who is dead or alive, there is a column titled “Censored,” which takes a value of 0 if the person is still dead and a value of 1 if that person is still alive. Related to this, there is another column called “Fail,” which takes on a value of 1 if the person is dead and a value of 0 if the person is alive. In total, there are 11 living leaders in our dataset, 4 of whose age of death we are trying to predict.
We use survival analysis as a way to model \(T_i\), the lifespan of a given leader i depending on their year of birth and type of leadership. Survival analysis is useful in that it allows the consideration of “censored” data. This means that we do not always observe the outcome for each data point, for example, we do not know the death date of leaders that are still alive. Thus, we do not know their lifespan, we just know that their survival time \(T_i\) will be greater than their current age.
We model the lifespan \(T_i\) of an individual i after the Weibull distribution specified below. We choose to use the Weibull distribution because it is often utilized in survival analysis and allows the user to specify a flexible shape parameter for the distribution. The first parameter \(r\), also known as the shape parameter, is a positive scalar, and the second parameter \(\mu\), the scale parameter, is a linear function of the covariates (birth year and types of leadership as well as their interactions). \[ T_i \sim Weibull(\alpha, \mu) \] \[\log(\mu_i) = \beta_0+\beta_1 (Year \ of \ Birth_i)+\beta_2 (Leadership_i = UsPres) + \beta_3 (Leadership_i = ChinaEmp) \] \[+ \beta_4 (Leadership_i = DalaiLama) +\beta_5 (Leadership_i = JapanEmp) \\\] \[+ \beta_6 (Year \ of \ Birth_i * (Leadership_i = UsPres)) \\\] \[+ \beta_7 (Year \ of \ Birth_i * (Leadership_i = ChinaEmp)) \\\] \[+ \beta_8 (Year \ of \ Birth_i * (Leadership_i = DalaiLama)) \\\] \[+ \beta_9 (Year \ of \ Birth_i * (Leadership_i = JapanEmp))\] where Leadership_i = Popes is the baseline for comparison.
To handle censored observations, we specify their contribution to the likelihood function using the “zeroes trick.” Since the sampling distribution of the censored observations is not known to be a standard distribution, we use the Poisson “zeroes trick” where the likelihood of a Poisson(phi) observation equal to zero is exp(-phi), and when observation[i] has a likelihood of L[i], then phi[i] is assigned to -log(L[i]) if our observed data is a set of 0’s. This is necessary because we don’t know the lifespan of currently living people, so we need a way to model their expected survival times. This method is also generally more computationally efficient (Stander et al, 2018).
We assume that our priors are independent because intuitively a given observation could not be two types of leaders. Additionally, we do not have sufficiently strong prior knowledge about the relationship between birth year and type of leadership to specify an informative prior. Therefore, we use uninformative priors for the betas in our model.
For our prior for r, we chose a prior of \(exp(1)\), as we believe the hazard of death increases with time.
#library(R2jags)
library(R2jags)
library(tidyverse)
library(lubridate)
library(knitr)
library(broom)
leaders_data <- get(load("Leaders.RData"))
leaders_data$Censored[leaders$Name == "Jianwen Emperor"] <- "0" #typo for Jianwen Emperor
leaders_data$Censored <- as.factor(leaders_data$Censored)
leaders_data$Type <- as.factor(leaders_data$Type)
leaders_data$Age.Event <- as.numeric(leaders_data$Age.Event)
summary(leaders_data)
## Name Birth.Date Event.Date Age.Event
## Length:177 Min. :1324-05-14 Min. :1368-03-29 Min. : 9.259
## Class :character 1st Qu.:1507-09-16 1st Qu.:1572-07-05 1st Qu.:53.985
## Mode :character Median :1705-10-31 Median :1758-05-03 Median :67.529
## Mean :1674-01-23 Mean :1737-03-27 Mean :63.170
## 3rd Qu.:1833-08-20 3rd Qu.:1886-11-18 3rd Qu.:78.242
## Max. :1961-08-04 Max. :2020-07-31 Max. :95.830
## Censored Type Fail centdate time
## 0:167 ChinaEmp :26 Min. :0.0000 Min. :1300-01-01 Min. :0.2436
## 1: 10 DalaiLama:14 1st Qu.:1.0000 1st Qu.:1300-01-01 1st Qu.:2.0770
## JapanEmp :30 Median :1.0000 Median :1300-01-01 Median :4.0582
## Pope :63 Mean :0.9379 Mean :1300-01-01 Mean :3.7406
## UsPres :44 3rd Qu.:1.0000 3rd Qu.:1300-01-01 3rd Qu.:5.3362
## Max. :1.0000 Max. :1300-01-01 Max. :6.6157
ggplot(data = leaders_data) +
geom_boxplot(aes(x = Type, y = Age.Event))
ggplot(data = leaders_data) +
geom_point(aes(x = Birth.Date, y = Age.Event))
\[ T_i \sim Weibull(\alpha, \mu) \\ log(\mu) = \beta_0 + \beta_1x_{i1}+\beta_2x_{i2}+...+\beta_px_{ip}\\ \] Log of mu is done so that the mu parameter is always positive (the support on mu in Weibull is that mu > 0)
\[ T_i \sim Weibull(r, \mu_i) \\ \log(\mu_i) = \beta_0+\beta_1 \text{(Year of Birth_i)}+\beta_2 \text{(Leadership_i = UsPres)} + \beta_3\text{(Leadership_i = ChinaEmp)} \\ + \beta_4\text{(Leadership_i = DalaiLama)} +\beta_5\text{(Leadership_i = JapanEmp)} \\ + \beta_6\text{(Year of Birth_i * (Leadership_i = UsPres))} \\ + \beta_7\text{(Year of Birth_i * (Leadership_i = ChinaEmp))} \\ + \beta_8\text{(Year of Birth_i * (Leadership_i = DalaiLama))} \\ + \beta_9\text{(Year of Birth_i * (Leadership_i = JapanEmp))} \]
# 10 censored data
leaders_data$Name[leaders_data$Censored == "1"]
## [1] "Francis" "Benedict XVI"
## [3] "Jimmy Carter" "Bill Clinton"
## [5] "George W. Bush" "Barack Obama"
## [7] "Donald Trump" "14th Dalai Lama (Tenzin Gyatso)"
## [9] "Akihito (Emperor Heisei)" "Naruhito (Emperor Reiwa)"
leaders_data$Birth.Year <- year(leaders_data$Birth.Date)
leaders_data <- leaders_data %>%
mutate(TypeChinaEmp = as.factor(case_when(Type == "ChinaEmp" ~ 1,
TRUE ~ 0)),
TypeDalaiLama = as.factor(case_when(Type == "DalaiLama" ~ 1,
TRUE ~ 0)),
TypeJapanEmp = as.factor(case_when(Type == "JapanEmp" ~ 1,
TRUE ~ 0)),
TypePope = as.factor(case_when(Type == "Pope" ~ 1,
TRUE ~ 0)),
TypeUsPres = as.factor(case_when(Type == "UsPres" ~ 1,
TRUE ~ 0)))
leaders_nopred <- leaders_data %>%
filter(!(Name %in% c("14th Dalai Lama (Tenzin Gyatso)", "Naruhito (Emperor Reiwa)", "Barack Obama", "Francis")))
survival_raw <- leaders_nopred$Age.Event
n <- length(survival_raw)
censored <- leaders_nopred$Censored
survival <- ifelse(censored == 0, survival_raw, NA)
censoring_limits <- ifelse(censored == 0, 100, survival_raw)
x_1 <- leaders_nopred$Birth.Year - mean(leaders_nopred$Birth.Year)
x_2 <- leaders_nopred$TypeUsPres
x_3 <- leaders_nopred$TypeChinaEmp
x_4 <- leaders_nopred$TypeDalaiLama
x_5 <- leaders_nopred$TypeJapanEmp
# add interaction b/w Birth.Year and Type
x_6 <- x_1*as.numeric(as.character(x_2))
x_7 <- x_1*as.numeric(as.character(x_3))
x_8 <- x_1*as.numeric(as.character(x_4))
x_9 <- x_1*as.numeric(as.character(x_5))
set.seed(123)
build_model <- function() {
for(i in 1:n_censored) {
z_censored[i] ~ dpois(phi_censored[i])
phi_censored[i] <- mu_censored[i] * pow(t_censored[i], r)
mu_censored[i] <- exp(beta_censored[i])
beta_censored[i] <- beta_0 + beta_1*x_1_censored[i] + beta_2*x_2_censored[i] + beta_3*x_3_censored[i] + beta_4*x_4_censored[i] + beta_5*x_5_censored[i] + beta_6*x_6_censored[i] + beta_7*x_7_censored[i] + beta_8*x_8_censored[i] + beta_9*x_9_censored[i]
}
for(j in 1:n_non_censored) { #total rows - 6 ** 6 are censored in leaders_nopred
survival_non_censored[j] ~ dweib(r, mu[j])
mu[j] <- exp(beta[j])
beta[j] <- beta_0 + beta_1*x_1_non_censored[j] + beta_2*x_2_non_censored[j] + beta_3*x_3_non_censored[j] + beta_4*x_4_non_censored[j] + beta_5*x_5_non_censored[j] + beta_6*x_6_non_censored[j] + beta_7*x_7_non_censored[j] + beta_8*x_8_non_censored[j] + beta_9*x_9_non_censored[j]
}
beta_0 ~ dnorm(0.0, 1.0E-3) # Prior on beta_0 is normal with low precision
beta_1 ~ dnorm(0.0, 1.0E-3) # Prior on beta_1 is normal with low precision
beta_2 ~ dnorm(0.0, 1.0E-3) # Prior on beta_2 is normal with low precision
beta_3 ~ dnorm(0.0, 1.0E-3) # Prior on beta_3 is normal with low precision
beta_4 ~ dnorm(0.0, 1.0E-3) # Prior on beta_4 is normal with low precision
beta_5 ~ dnorm(0.0, 1.0E-3) # Prior on beta_5 is normal with low precision
beta_6 ~ dnorm(0.0, 1.0E-3) # Prior on beta_6 is normal with low precision
beta_7 ~ dnorm(0.0, 1.0E-3) # Prior on beta_7 is normal with low precision
beta_8 ~ dnorm(0.0, 1.0E-3) # Prior on beta_8 is normal with low precision
beta_9 ~ dnorm(0.0, 1.0E-3) # Prior on beta_9 is normal with low precision
r ~ dexp(1) # Prior on r
# sensitivity analysis priors
# beta_0 ~ dnorm(0.5, 1.0E-2) # Prior on beta_0 is normal with low precision
# beta_1 ~ dnorm(0.5, 1.0E-2) # Prior on beta_1 is normal with low precision
# beta_2 ~ dnorm(0.5, 1.0E-2) # Prior on beta_2 is normal with low precision
# beta_3 ~ dnorm(0.5, 1.0E-2) # Prior on beta_3 is normal with low precision
# beta_4 ~ dnorm(0.5, 1.0E-2) # Prior on beta_4 is normal with low precision
# beta_5 ~ dnorm(0.5, 1.0E-2) # Prior on beta_5 is normal with low precision
# beta_6 ~ dnorm(0.5, 1.0E-2) # Prior on beta_6 is normal with low precision
# beta_7 ~ dnorm(0.5, 1.0E-2) # Prior on beta_7 is normal with low precision
# beta_8 ~ dnorm(0.5, 1.0E-2) # Prior on beta_8 is normal with low precision
# beta_9 ~ dnorm(0.5, 1.0E-2) # Prior on beta_9 is normal with low precision
# r ~ dexp(1) # Prior on r
# Define alphas
alpha_0 <- - beta_0 / r
alpha_1 <- - beta_1 / r
alpha_2 <- - beta_2 / r
alpha_3 <- - beta_3 / r
alpha_4 <- - beta_4 / r
alpha_5 <- - beta_5 / r
alpha_6 <- - beta_6 / r
alpha_7 <- - beta_7 / r
alpha_8 <- - beta_8 / r
alpha_9 <- - beta_9 / r
# Percentage increases
percentage_increase_year <- 100*(exp(alpha_1) - 1)
percentage_increase_UsPres <- 100*(exp(alpha_2) - 1)
percentage_increase_ChinaEmp <- 100*(exp(alpha_3) - 1)
percentage_increase_DalaiLama <- 100*(exp(alpha_4) - 1)
percentage_increase_JapanEmp <- 100*(exp(alpha_5) - 1)
percentage_increase_BYUsPres <- 100*(exp(alpha_6) - 1)
percentage_increase_BYChinaEmp <- 100*(exp(alpha_7) - 1)
percentage_increase_BYDalaiLama <- 100*(exp(alpha_8) - 1)
percentage_increase_BYJapanEmp <- 100*(exp(alpha_9) - 1)
# Predictive distribution of age at the new values
beta_Francis <- beta_0 + beta_1*year_Francis
mu_Francis <- exp(beta_Francis)
survival_Francis ~ dweib(r, mu_Francis) %_% T(present_length_Francis, upper_length)
age_Francis_predictive <- survival_Francis
beta_Obama <- beta_0 + (beta_1+beta_6)*year_Obama + beta_2
mu_Obama <- exp(beta_Obama)
survival_Obama ~ dweib(r, mu_Obama) %_% T(present_length_Obama, upper_length)
age_Obama_predictive <- survival_Obama
beta_Dalai <- beta_0 + (beta_1+beta_8)*year_Dalai + beta_4
mu_Dalai <- exp(beta_Dalai)
survival_Dalai ~ dweib(r, mu_Dalai) %_% T(present_length_Dalai, upper_length)
age_Dalai_predictive <- survival_Dalai
beta_Naruhito <- beta_0 + (beta_1+beta_9)*year_Naruhito + beta_5
mu_Naruhito <- exp(beta_Naruhito)
survival_Naruhito ~ dweib(r, mu_Naruhito) %_% T(present_length_Naruhito, upper_length)
age_Naruhito_predictive <- survival_Naruhito
}
censored_dex = which(censored==1) # index of ppl who are censored
z_censored <- rep(0, length(censored_dex))
t_censored <- censoring_limits[censored_dex]
x_1_censored <- x_1[censored_dex]
x_2_censored <- x_2[censored_dex]
x_3_censored <- x_3[censored_dex]
x_4_censored <- x_4[censored_dex]
x_5_censored <- x_5[censored_dex]
x_6_censored <- x_6[censored_dex]
x_7_censored <- x_7[censored_dex]
x_8_censored <- x_8[censored_dex]
x_9_censored <- x_9[censored_dex]
n_censored <- length(censored_dex)
survival_non_censored <- survival[-censored_dex] # Remove ALL ALIVE PEOPLE
x_1_non_censored <- x_1[-censored_dex]
x_2_non_censored <- x_2[-censored_dex]
x_3_non_censored <- x_3[-censored_dex]
x_4_non_censored <- x_4[-censored_dex]
x_5_non_censored <- x_5[-censored_dex]
x_6_non_censored <- x_6[-censored_dex]
x_7_non_censored <- x_7[-censored_dex]
x_8_non_censored <- x_8[-censored_dex]
x_9_non_censored <- x_9[-censored_dex]
n_non_censored <- length(survival_non_censored)
birth_year_Francis <- leaders_data$Birth.Year[leaders_data$Name == "Francis"]
year_Francis <- birth_year_Francis - mean(leaders_nopred$Birth.Year)
present_length_Francis <- leaders_data$Age.Event[leaders_data$Name =="Francis"]
birth_year_Obama <- leaders_data$Birth.Year[leaders_data$Name == "Barack Obama"]
year_Obama <- birth_year_Obama - mean(leaders_nopred$Birth.Year)
present_length_Obama <- leaders_data$Age.Event[leaders_data$Name =="Barack Obama"]
birth_year_Dalai <- leaders_data$Birth.Year[leaders_data$Name == "14th Dalai Lama (Tenzin Gyatso)"]
year_Dalai <- birth_year_Dalai - mean(leaders_nopred$Birth.Year)
present_length_Dalai <- leaders_data$Age.Event[leaders_data$Name =="14th Dalai Lama (Tenzin Gyatso)"]
birth_year_Naruhito <- leaders_data$Birth.Year[leaders_data$Name == "Naruhito (Emperor Reiwa)"]
year_Naruhito <- birth_year_Naruhito - mean(leaders_nopred$Birth.Year)
present_length_Naruhito <- leaders_data$Age.Event[leaders_data$Name =="Naruhito (Emperor Reiwa)"]
upper_length <- 100
data_build_model <- list("n_censored",
"n_non_censored",
"z_censored",
"t_censored",
"x_1_censored",
"x_2_censored",
"x_3_censored",
"x_4_censored",
"x_5_censored",
"x_6_censored",
"x_7_censored",
"x_8_censored",
"x_9_censored",
"survival_non_censored",
"x_1_non_censored",
"x_2_non_censored",
"x_3_non_censored",
"x_4_non_censored",
"x_5_non_censored",
"x_6_non_censored",
"x_7_non_censored",
"x_8_non_censored",
"x_9_non_censored",
"year_Francis",
"year_Obama",
"year_Dalai",
"year_Naruhito",
"present_length_Francis",
"present_length_Obama",
"present_length_Dalai",
"present_length_Naruhito",
"upper_length")
model_output <- jags(data = data_build_model,
parameters.to.save = c("beta_0",
"beta_1",
"beta_2",
"beta_3",
"beta_4",
"beta_5",
"beta_6",
"beta_7",
"beta_8",
"beta_9",
"r",
"alpha_0",
"alpha_1",
"alpha_2",
"alpha_3",
"alpha_4",
"alpha_5",
"alpha_6",
"alpha_7",
"alpha_8",
"alpha_9",
"percentage_increase_year",
"percentage_increase_UsPres",
"percentage_increase_ChinaEmp",
"percentage_increase_DalaiLama",
"percentage_increase_JapanEmp",
"percentage_increase_BYUsPres",
"percentage_increase_BYChinaEmp",
"percentage_increase_BYDalaiLama",
"percentage_increase_BYJapanEmp",
"age_Francis_predictive",
"age_Obama_predictive",
"age_Dalai_predictive",
"age_Naruhito_predictive"#,
#"survival_non_censored"
),
n.iter = 50000,
n.chains = 3,
model.file = build_model)
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 173
## Unobserved stochastic nodes: 15
## Total graph size: 2430
##
## Initializing model
model_output
## Inference for Bugs model at "/var/folders/hs/d7t9jp1x3t7cwq2xz5rf7f8h0000gn/T//Rtmpmyq4PV/model163e4e9db5db.txt", fit using jags,
## 3 chains, each with 50000 iterations (first 25000 discarded), n.thin = 25
## n.sims = 3000 iterations saved
## mu.vect sd.vect 2.5% 25% 50%
## age_Dalai_predictive 92.468 4.298 85.459 88.707 92.366
## age_Francis_predictive 92.682 4.674 84.286 88.823 93.058
## age_Naruhito_predictive 85.231 10.457 63.166 77.838 87.110
## age_Obama_predictive 84.391 11.078 61.557 76.479 86.250
## alpha_0 5.389 0.217 4.959 5.249 5.388
## alpha_1 0.000 0.000 0.000 0.000 0.000
## alpha_2 -0.145 0.102 -0.345 -0.211 -0.144
## alpha_3 -0.319 0.064 -0.444 -0.363 -0.318
## alpha_4 -0.404 0.077 -0.553 -0.458 -0.405
## alpha_5 -0.172 0.058 -0.285 -0.211 -0.172
## alpha_6 0.000 0.001 -0.001 0.000 0.000
## alpha_7 0.000 0.000 -0.001 0.000 0.000
## alpha_8 -0.002 0.000 -0.003 -0.002 -0.002
## alpha_9 0.000 0.000 -0.001 0.000 0.000
## beta_0 -22.592 1.468 -25.522 -23.552 -22.573
## beta_1 -0.002 0.001 -0.003 -0.002 -0.002
## beta_2 0.606 0.423 -0.242 0.330 0.606
## beta_3 1.336 0.259 0.832 1.159 1.342
## beta_4 1.690 0.317 1.044 1.485 1.704
## beta_5 0.721 0.237 0.251 0.569 0.721
## beta_6 -0.002 0.002 -0.007 -0.004 -0.002
## beta_7 0.000 0.001 -0.002 0.000 0.000
## beta_8 0.007 0.002 0.003 0.006 0.007
## beta_9 0.000 0.001 -0.002 -0.001 0.000
## percentage_increase_BYChinaEmp -0.011 0.033 -0.076 -0.032 -0.011
## percentage_increase_BYDalaiLama -0.162 0.042 -0.250 -0.189 -0.161
## percentage_increase_BYJapanEmp -0.006 0.030 -0.065 -0.026 -0.006
## percentage_increase_BYUsPres 0.049 0.058 -0.064 0.009 0.047
## percentage_increase_ChinaEmp -27.184 4.638 -35.869 -30.471 -27.262
## percentage_increase_DalaiLama -33.024 5.190 -42.501 -36.740 -33.301
## percentage_increase_JapanEmp -15.695 4.871 -24.815 -19.061 -15.813
## percentage_increase_UsPres -13.044 8.918 -29.152 -19.032 -13.446
## percentage_increase_year 0.039 0.018 0.005 0.027 0.039
## r 4.195 0.256 3.693 4.030 4.182
## deviance 1433.038 4.775 1425.893 1429.511 1432.229
## 75% 97.5% Rhat n.eff
## age_Dalai_predictive 96.114 99.608 1.001 3000
## age_Francis_predictive 96.830 99.677 1.001 2600
## age_Naruhito_predictive 94.227 99.367 1.001 3000
## age_Obama_predictive 93.851 99.480 1.002 1500
## alpha_0 5.528 5.832 1.003 900
## alpha_1 0.001 0.001 1.001 3000
## alpha_2 -0.078 0.060 1.001 3000
## alpha_3 -0.275 -0.198 1.001 3000
## alpha_4 -0.353 -0.250 1.003 970
## alpha_5 -0.134 -0.059 1.003 850
## alpha_6 0.001 0.002 1.001 3000
## alpha_7 0.000 0.001 1.001 3000
## alpha_8 -0.001 -0.001 1.002 2000
## alpha_9 0.000 0.001 1.001 3000
## beta_0 -21.640 -19.746 1.008 280
## beta_1 -0.001 0.000 1.001 3000
## beta_2 0.887 1.437 1.001 3000
## beta_3 1.513 1.826 1.003 1100
## beta_4 1.909 2.275 1.005 470
## beta_5 0.879 1.177 1.004 540
## beta_6 0.000 0.003 1.001 3000
## beta_7 0.001 0.003 1.001 3000
## beta_8 0.008 0.010 1.002 1300
## beta_9 0.001 0.003 1.001 3000
## percentage_increase_BYChinaEmp 0.012 0.053 1.001 3000
## percentage_increase_BYDalaiLama -0.133 -0.083 1.002 2000
## percentage_increase_BYJapanEmp 0.014 0.056 1.001 3000
## percentage_increase_BYUsPres 0.088 0.162 1.001 3000
## percentage_increase_ChinaEmp -24.027 -17.986 1.001 3000
## percentage_increase_DalaiLama -29.773 -22.153 1.003 920
## percentage_increase_JapanEmp -12.545 -5.761 1.003 840
## percentage_increase_UsPres -7.521 6.140 1.001 3000
## percentage_increase_year 0.051 0.076 1.001 3000
## r 4.356 4.723 1.004 550
## deviance 1435.825 1444.160 1.001 2600
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 11.4 and DIC = 1444.4
## DIC is an estimate of expected predictive error (lower deviance is better).
library(coda)
parameters = c("beta_0", "beta_1", "beta_2", "beta_3", "beta_4", "beta_5",
"beta_6", "beta_7", "beta_8", "beta_9", "r")
res = data.frame(model_output$BUGSoutput$sims.matrix)[,parameters]
colnames(res) <- gsub("^.*\\.","", colnames(res) )
# lapply(seq(1,length(res)), function(i) {
# traceplot(as.mcmc(res[,i]), smooth = FALSE, type = "l",
# xlab = "Iterations", ylab = colnames(res)[i],
# main = paste("Traceplot of ", colnames(res)[i]))
# })
tps <- function(var){
ggplot(res, aes_(y=as.name(var), x=seq(1,nrow(res)))) +
geom_line() +
labs(title=paste("Traceplot of ", as.name(var)),
x ="Iterations", y = as.name(var))
}
lapply(names(res), tps)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
library(ggplot2)
res_lag1 = lapply(seq(1:length(res)), function(i) {
lres = lag(res[,i],1)
plot(y=res[,i], x= lres,
xlab = paste0(colnames(res)[i], "-1"),
ylab = paste0(colnames(res)[i]),
main = paste("Lag-1 Scatter Plot of", colnames(res)[i]))
})
lapply(seq(1,length(res)), function(i) {
acf(res[,i], xlab = "Lag", ylab = "ACF",
main = paste("ACF Plot of ", colnames(res)[i]))
})
## [[1]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 -0.003 -0.030 -0.003 0.047 0.004 0.004 -0.021 -0.018 0.013 0.011
## 11 12 13 14 15 16 17 18 19 20 21
## -0.004 0.013 -0.005 0.004 0.018 0.012 -0.019 -0.005 0.003 0.017 -0.017
## 22 23 24 25 26 27 28 29 30 31 32
## -0.015 -0.001 -0.027 0.017 0.007 -0.017 -0.046 -0.006 -0.009 0.028 0.015
## 33 34
## -0.020 -0.002
##
## [[2]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 -0.013 -0.011 -0.030 -0.002 0.030 -0.002 -0.011 -0.039 0.008 -0.030
## 11 12 13 14 15 16 17 18 19 20 21
## -0.019 -0.004 -0.017 -0.038 -0.002 0.015 0.010 -0.018 0.025 -0.005 -0.003
## 22 23 24 25 26 27 28 29 30 31 32
## -0.016 0.012 0.026 -0.003 -0.029 -0.021 0.004 0.031 -0.026 -0.008 -0.009
## 33 34
## -0.027 -0.018
##
## [[3]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 0.027 0.002 -0.020 -0.018 -0.006 0.014 0.008 0.010 0.016 0.008
## 11 12 13 14 15 16 17 18 19 20 21
## -0.022 -0.014 0.024 0.020 0.013 -0.020 -0.034 -0.022 -0.022 0.010 -0.018
## 22 23 24 25 26 27 28 29 30 31 32
## -0.010 0.019 -0.013 -0.048 -0.023 -0.009 0.023 -0.010 -0.027 0.004 -0.011
## 33 34
## -0.001 -0.010
##
## [[4]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 -0.001 0.007 -0.013 0.015 -0.032 0.036 -0.012 -0.008 -0.026 0.001
## 11 12 13 14 15 16 17 18 19 20 21
## -0.013 0.014 0.025 -0.008 -0.028 0.000 0.018 -0.021 0.000 0.024 -0.032
## 22 23 24 25 26 27 28 29 30 31 32
## -0.014 -0.003 0.027 0.011 0.027 0.003 0.001 0.012 -0.016 0.014 -0.025
## 33 34
## -0.003 0.004
##
## [[5]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 0.003 -0.017 -0.023 -0.006 -0.007 -0.049 -0.022 -0.017 -0.003 -0.001
## 11 12 13 14 15 16 17 18 19 20 21
## -0.018 0.002 -0.023 0.031 0.003 0.018 0.003 0.010 -0.026 0.036 -0.005
## 22 23 24 25 26 27 28 29 30 31 32
## -0.036 0.007 -0.013 0.020 -0.024 -0.017 0.007 -0.008 0.006 0.026 0.016
## 33 34
## -0.011 -0.013
##
## [[6]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 -0.018 0.011 -0.033 0.010 -0.003 0.025 0.002 -0.025 -0.014 -0.012
## 11 12 13 14 15 16 17 18 19 20 21
## 0.002 0.004 0.001 -0.003 0.011 -0.001 0.019 -0.018 0.003 0.002 -0.004
## 22 23 24 25 26 27 28 29 30 31 32
## -0.013 -0.005 -0.005 -0.004 -0.026 0.012 -0.052 -0.025 0.011 0.049 0.016
## 33 34
## 0.003 0.013
##
## [[7]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 0.014 -0.012 -0.013 -0.029 0.013 0.017 -0.013 0.021 0.009 -0.013
## 11 12 13 14 15 16 17 18 19 20 21
## -0.026 -0.005 0.021 0.010 0.030 -0.032 -0.011 -0.013 -0.011 0.001 -0.025
## 22 23 24 25 26 27 28 29 30 31 32
## -0.038 0.004 0.001 -0.030 -0.032 -0.038 0.028 -0.022 -0.025 -0.015 -0.005
## 33 34
## 0.011 0.014
##
## [[8]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 0.000 -0.019 -0.017 0.022 0.003 -0.008 -0.019 -0.002 -0.012 0.000
## 11 12 13 14 15 16 17 18 19 20 21
## 0.000 0.033 0.008 0.027 0.009 0.000 0.023 -0.004 0.022 0.000 0.016
## 22 23 24 25 26 27 28 29 30 31 32
## -0.003 -0.010 0.032 0.017 -0.004 0.015 -0.039 0.035 -0.008 0.016 -0.010
## 33 34
## -0.032 -0.014
##
## [[9]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 0.001 -0.015 0.036 -0.025 -0.002 0.006 -0.034 -0.018 0.002 -0.036
## 11 12 13 14 15 16 17 18 19 20 21
## -0.037 -0.001 -0.032 0.006 0.022 0.022 0.007 -0.023 -0.004 -0.023 -0.018
## 22 23 24 25 26 27 28 29 30 31 32
## 0.000 0.023 -0.006 0.012 0.019 -0.016 0.009 -0.011 -0.026 0.021 -0.032
## 33 34
## -0.034 -0.050
##
## [[10]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 -0.010 -0.019 -0.026 0.024 0.016 -0.007 -0.011 -0.016 0.024 -0.033
## 11 12 13 14 15 16 17 18 19 20 21
## -0.040 -0.031 -0.014 0.010 0.004 -0.010 0.012 0.017 -0.006 0.022 -0.002
## 22 23 24 25 26 27 28 29 30 31 32
## 0.009 0.006 0.001 -0.023 -0.017 0.010 -0.010 0.039 -0.011 -0.014 -0.001
## 33 34
## 0.033 -0.015
##
## [[11]]
##
## Autocorrelations of series 'res[, i]', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 -0.018 -0.035 0.018 0.019 0.013 -0.016 -0.001 0.000 0.008 0.016
## 11 12 13 14 15 16 17 18 19 20 21
## -0.004 0.000 -0.020 0.006 0.007 0.030 -0.030 -0.015 0.030 0.021 -0.007
## 22 23 24 25 26 27 28 29 30 31 32
## 0.013 0.003 -0.032 0.019 0.013 -0.011 -0.019 -0.033 0.022 0.003 -0.002
## 33 34
## -0.013 -0.022
The combination of traceplots, lag-1 scatterplots, and acf plots suggest the chain for each parameters converges, and that 50000 iterations is sufficient. The Rhat’s are all close to 1, which is another indicator of converge. All of the effective sample sizes are greater than 200, except for that of beta_0 (the intercept) and r.
As Stander et al (2018) pointed out, following this Weibull distribution, \(log(T_i)\) is equal in distribution to \(\frac{1}{r} (\beta_0 + \beta_1x_{i1}+\beta_2x_{i2}+...+\beta_9x_{i1}x_{i5}) + \frac{1}{r}\log(\epsilon)\) where \(\epsilon \sim exp(1)\). \(\alpha_j = -\beta_j/r\), \(j = 1,2,...,9\). The interpretation of coefficients depends on the interaction terms (i.e. both year of birth and the type of leadership).
For example, if a Pope is born one year later, he is expected to live longer by a multiplicative factor of \(exp(\alpha_1)\), or his lifespan is expected to increase by a percentage of \(100*exp(\alpha_1-1)\). If a U.S. President is born one year later, he is expected to live longer by a multiplicative factor of \(exp(\alpha_1 + \alpha_6)\); for a U.S. President and a Chinese Emperor were born in the same year \(y\), the Chinese emperor is expected to live longer by a multiplicative factor of \(exp(\alpha_3 - \alpha_2 + (\alpha_7-\alpha_6)*y)\) and so on.
people = c("age_Francis_predictive", "age_Obama_predictive", "age_Dalai_predictive",
"age_Naruhito_predictive")
four_pred = data.frame(model_output$BUGSoutput$sims.matrix)[,people]
print(paste("P(Dalai Lifespan > Francis Lifespan)=", mean(four_pred$age_Dalai_predictive>four_pred$age_Francis_predictive)))
## [1] "P(Dalai Lifespan > Francis Lifespan)= 0.480666666666667"
print(paste("P(Obama Lifespan > Naruhito Lifespan)=", mean(four_pred$age_Obama_predictive>four_pred$age_Naruhito_predictive)))
## [1] "P(Obama Lifespan > Naruhito Lifespan)= 0.483666666666667"
insert model output summary here
Our model predicts that the 85-year-old 14th Dalai Lama is expected to live 92.4 years (95% credible interval: 85.4 - 99.6 years), the 84-year-old Pope Francis is expected to live 92.4 years (95% credible interval: 84.2 - 99.4 years), the 60-year-old Janpanese Emperior Naruhito is expected to live 85.3 years (95% credible interval: 66.4 - 99.5 years) and the 60-year-old former President Barack Obama is expected to live 84.8 years (95% credible interval: 61.5 - 99.4 years).
francis_dalai = data.frame(plifespans = c(four_pred$age_Francis_predictive, four_pred$age_Dalai_predictive), person = c(rep("Francis", nrow(four_pred)), rep("Dalai", nrow(four_pred))))
mins = francis_dalai %>%
group_by(person) %>%
summarise(min_vals = round(min(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
meds = francis_dalai %>%
group_by(person) %>%
summarise(med_vals = round(median(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
val95 = francis_dalai %>%
group_by(person) %>%
summarise(vals_95 = round(quantile(plifespans, 0.95), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
francis_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Francis"], "yrs"),
paste("Median:", meds$med_vals[meds$person=="Francis"],"yrs"),
paste("95% quantile:", val95$vals_95[val95$person=="Francis"], "yrs"),
sep = "\n")
dalai_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Dalai"], "yrs"),
paste("Median:", meds$med_vals[meds$person=="Dalai"],"yrs"),
paste("95% quantile:", val95$vals_95[val95$person=="Dalai"], "yrs"),
sep = "\n")
plot_txt <- data.frame(
label = c(francis_txt, dalai_txt),
person = c("Francis", "Dalai")
)
ggplot(francis_dalai, aes(x=plifespans)) +
geom_histogram(aes(y=..density..), binwidth = 0.8, boundary = round(min(francis_dalai$plifespans), 2)) +
facet_wrap(~person) +
geom_vline(data=mins, aes(xintercept=min_vals), color = "black")+
geom_vline(data=meds, aes(xintercept=med_vals), color = "blue")+
geom_vline(data=val95, aes(xintercept=vals_95), color = "red")+
geom_hline(yintercept = 0, colour = "black") +
xlab("Lifespan (yrs)") + ylab("Predictive Probability density") +
ylim(c(0, 0.1)) +
geom_text(data = plot_txt, mapping = aes(x = 88.5, y = 0.09, label = label), size = 2)
obama_naru = data.frame(plifespans = c(four_pred$age_Obama_predictive, four_pred$age_Naruhito_predictive), person = c(rep("Obama", nrow(four_pred)), rep("Naruhito", nrow(four_pred))))
mins = obama_naru %>%
group_by(person) %>%
summarise(min_vals = round(min(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
meds = obama_naru %>%
group_by(person) %>%
summarise(med_vals = round(median(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
val95 = obama_naru %>%
group_by(person) %>%
summarise(vals_95 = round(quantile(plifespans, 0.95), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
obama_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Obama"], "yrs"),
paste("Median:", meds$med_vals[meds$person=="Obama"],"yrs"),
paste("95% quantile:", val95$vals_95[val95$person=="Obama"], "yrs"),
sep = "\n")
naru_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Naruhito"], "yrs"),
paste("Median:", meds$med_vals[meds$person=="Naruhito"],"yrs"),
paste("95% quantile:", val95$vals_95[val95$person=="Naruhito"], "yrs"),
sep = "\n")
plot_txt <- data.frame(
label = c(obama_txt, naru_txt),
person = c("Obama", "Naruhito")
)
ggplot(obama_naru, aes(x=plifespans)) +
geom_histogram(aes(y=..density..), binwidth = 0.8,boundary = round(min(obama_naru$plifespans), 2)) +
facet_wrap(~person) +
geom_vline(data=mins, aes(xintercept=min_vals), color = "black")+
geom_vline(data=meds, aes(xintercept=med_vals), color = "blue")+
geom_vline(data=val95, aes(xintercept=vals_95), color = "red")+
geom_hline(yintercept = 0, colour = "black") +
xlab("Lifespan (yrs)") + ylab("Predictive Probability density") +
ylim(c(0, 0.06)) +
geom_text(data = plot_txt, mapping = aes(x = 72, y = 0.05, label = label), size = 3)
The posterior predictive distribution of the lifespan of the 14th Dalai Lama is more uniform, with mode in the 90s. The posterior predictive distribution of the lifespan of Pope Francis is somewhat left skewed, with the mode in the early 90s. The posterior predictive distributions of the lifespans of President Obama and Emperor Naruhito are both left skewed, with the modes in the late 90s.
The probability that the 14th Dalai Lama will have a longer lifespan than Pope Francis is mean(four_pred$age_Dalai_predictive>four_pred$age_Francis_predictive)). The probability that President Obama will have a longer lifespan than Emperor Naruhito is 0.483.
# select betas and r from output and put into a tidy table
vars = c("beta_0", "beta_1", "beta_2", "beta_3", "beta_4", "beta_5", "beta_6",
"beta_7", "beta_8", "beta_9", "r")
output <- data.frame(model_output$BUGSoutput$summary)
output <- output[row.names(output) %in% vars, ]
coef <- output %>%
mutate("Variable" = head(vars,16),
"Mean" = output$mean,
"Standard Deviation" = output$sd,
"2.5% Quantile" = output$X2.5.,
"Median" = output$X50.,
"97.5% Quantile" = output$X97.5.
) %>%
select("Variable", "Mean", "Standard Deviation", "2.5% Quantile", "Median", "97.5% Quantile")
kable(coef)
| Variable | Mean | Standard Deviation | 2.5% Quantile | Median | 97.5% Quantile |
|---|---|---|---|---|---|
| beta_0 | -22.5915879 | 1.4680033 | -25.5221058 | -22.5730566 | -19.7465000 |
| beta_1 | -0.0016443 | 0.0007534 | -0.0031519 | -0.0016419 | -0.0002007 |
| beta_2 | 0.6059671 | 0.4230495 | -0.2422546 | 0.6059599 | 1.4371459 |
| beta_3 | 1.3355461 | 0.2587216 | 0.8319711 | 1.3420033 | 1.8255342 |
| beta_4 | 1.6900917 | 0.3171348 | 1.0439544 | 1.7043228 | 2.2750221 |
| beta_5 | 0.7206409 | 0.2366567 | 0.2509796 | 0.7212037 | 1.1769018 |
| beta_6 | -0.0020307 | 0.0024308 | -0.0067542 | -0.0019584 | 0.0026365 |
| beta_7 | 0.0004541 | 0.0013653 | -0.0022375 | 0.0004524 | 0.0031630 |
| beta_8 | 0.0067854 | 0.0017555 | 0.0033960 | 0.0067710 | 0.0104489 |
| beta_9 | 0.0002448 | 0.0012659 | -0.0023251 | 0.0002513 | 0.0026518 |
| r | 4.1947291 | 0.2561452 | 3.6925186 | 4.1820203 | 4.7225935 |
# calculate and create graphs for standardized residuals
#select estimated mean betas
beta_0_est <- coef[1,2]
beta_1_est <- coef[2,2]
beta_2_est <- coef[3,2]
beta_3_est <- coef[4,2]
beta_4_est <- coef[5,2]
beta_5_est <- coef[6,2]
beta_6_est <- coef[7,2]
beta_7_est <- coef[8,2]
beta_8_est <- coef[9,2]
beta_9_est <- coef[10,2]
# select only non-living leaders to calculate residuals for
no_living <- leaders_data %>%
filter(Censored == 0)
# mutate data to calculate mus
estimates <- no_living %>%
mutate(Birth.Year.Cent = Birth.Year - mean(leaders_nopred$Birth.Year)) %>%
mutate(ChinaEmp =case_when(Type == "ChinaEmp" ~ 1, TRUE ~ 0),
DalaiLama =case_when(Type == "DalaiLama" ~ 1, TRUE ~ 0),
JapanEmp =case_when(Type == "JapanEmp" ~ 1, TRUE ~ 0),
Pope =case_when(Type == "Pope" ~ 1, TRUE ~ 0),
UsPres =case_when(Type == "UsPres" ~ 1, TRUE ~ 0)) %>%
mutate(BYUsPres = Birth.Year.Cent*UsPres) %>%
mutate(BYChinaEmp = Birth.Year.Cent*ChinaEmp) %>%
mutate(BYDalaiLama = Birth.Year.Cent*DalaiLama) %>%
mutate(BYJapanEmp = Birth.Year.Cent*JapanEmp) %>%
mutate(intercept = 1) %>%
mutate(predicted = exp(beta_0_est + beta_1_est*(Birth.Year.Cent) + beta_2_est*(UsPres) + beta_3_est*(ChinaEmp) + beta_4_est*(DalaiLama) + beta_5_est*(JapanEmp) + beta_6_est*(UsPres)*(Birth.Year.Cent) + beta_7_est*(ChinaEmp)*(Birth.Year.Cent) + beta_8_est*(DalaiLama)*(Birth.Year.Cent) + beta_9_est*(JapanEmp)*(Birth.Year.Cent)))
data.mat = as.matrix(estimates[,c("intercept", "Birth.Year.Cent", "UsPres", "ChinaEmp", "DalaiLama", "JapanEmp","BYUsPres", "BYChinaEmp", "BYDalaiLama", "BYJapanEmp")], nrow = nrow(estimates), byrow = TRUE)
pred_res = data.frame(t(model_output$BUGSoutput$summary))
r = pred_res$r[1]
betas = pred_res[,c("beta_0", "beta_1", "beta_2", "beta_3", "beta_4", "beta_5",
"beta_6", "beta_7", "beta_8","beta_9")][1,]
beta.mat = as.matrix(betas, nrow = p, byrow = TRUE)
logmus = data.mat %*% t(beta.mat)
# install.packages("TruncatedDistributions", repos="http://R-Forge.R-project.org")
library(TruncatedDistributions)
# draw predicted values based on mus calculated above
set.seed(123)
Ts = rtweibull(166, r, (1/exp(logmus))^(1/r), 0, 100)
## Warning in (1 - x) * Fa: longer object length is not a multiple of shorter
## object length
## Warning in x * Fb: longer object length is not a multiple of shorter object
## length
#calculate residuals
resid = estimates$Age.Event - Ts
#calculate standard deviations
meanTs = mean(Ts)
diff = Ts - meanTs
sd = sqrt((1/166)*(sum((diff)^2)) * (1+ (1/166) + (diff)^2/sum((diff)^2)))
# make plots
ggplot(data = estimates, aes(x = Ts, y = resid/sd)) + geom_point() +
labs(x = "Predicted Lifespan", y = "Standardized Residual")
ggplot(data = estimates, aes(x = Birth.Year, y = resid/sd)) + geom_point() +
labs(x = "Birth Year", y = "Standardized Residual")
# non_censored_prediction <- filter(leaders_data, Censored == 0)
# K <- 10
# ysims <- matrix(nrow = 166, ncol = K)
#
# for(k in 1:K){
# for(i in 1:nrow(non_censored_prediction)){
# w <- sample(nrow(res), 1)
# samp <- res[w,]
# val <- non_censored_prediction[i, ]
# beta_temp <- samp$beta_0 + samp$beta_1*val$Birth.Year + samp$beta_2*as.integer(val$TypeUsPres) + samp$beta_3*as.integer(val$TypeChinaEmp) + samp$beta_4*as.integer(val$TypeDalaiLama) + samp$beta_5*as.integer(val$TypeJapanEmp) + samp$beta_6*val$Birth.Year*as.integer(val$TypeUsPres) + samp$beta_7*val$Birth.Year*as.integer(val$TypeChinaEmp) + samp$beta_8*val$Birth.Year*as.integer(val$TypeDalaiLama) + samp$beta_9*val$Birth.Year*as.integer(val$TypeJapanEmp)
# mu <- exp(beta_temp)
# t <- rtweibull(1, samp$r, (1/mu)^(1/samp$r), 50, 100)
# ysims[i, k] <- t
# }
#}
# r <- sample(nrow(non_censored_prediction), 1)
# hist(ysims[,1])
# abline(v=non_censored_prediction$Age.Event[r], col = "red")
# print(mean(ysims<0))
# print(mean(non_censored_prediction$Age.Event <0))
#lg <- lexis_grid(1960, 2000, 30, 80, delta=5)
#leaders_data$Initial.Age <- year(leaders_data$Event.Date) - leaders_data$Birth.Year
#subset <- filter(leaders_data, Censored == 0)
#subset <- subset[0:5,]
#ggplot() + geom_polygon(data = subset, mapping = aes(x = Birth.Year, y = Age.Event, group = Name)) + geom_point(data = subset, aes(x = Birth.Year, y = Birth.Year + Age.Event)) + geom_polygon(data = subset, mapping = aes(x = Birth.Year, y = ))
subset <- filter(leaders_data, Censored == 0)
ggplot(data = subset, aes(x = Birth.Year, Age.Event)) + geom_point(color="pink") + ggtitle("Distribution of Leaders' Ages") + labs(x = "Birth Year", y = "Age")
We sought to compare Popes, US Presidents, Dalai Lamas, Chinese Emperors, and Japanese Emperors to see how their lifespans compare. Our model has found that the impact of birth year does not change based on leadership type. Additionally, our model has found that lifespan does not depend on year of birth.
This model could be improved by including more predictors; for example, health widely varies among people, and could lead to a better model for prediction. For further implementations of this research question, it would be valuable to interpret economic data as it concerns the different countries that the leaders grew up in, as the conditions that a person live in lead to changes in a person’s lifespan.
cite Stander et al